Overview

Dataset statistics

Number of variables24
Number of observations41221
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.5 MiB
Average record size in memory192.0 B

Variable types

Numeric12
Categorical12

Warnings

int_rate has a high cardinality: 394 distinct values High cardinality
earliest_cr_line has a high cardinality: 519 distinct values High cardinality
revol_util has a high cardinality: 1116 distinct values High cardinality
last_credit_pull_d has a high cardinality: 108 distinct values High cardinality
loan_amnt is highly correlated with installmentHigh correlation
installment is highly correlated with loan_amntHigh correlation
open_acc is highly correlated with total_accHigh correlation
total_acc is highly correlated with open_accHigh correlation
loan_amnt is highly correlated with installmentHigh correlation
installment is highly correlated with loan_amntHigh correlation
open_acc is highly correlated with total_accHigh correlation
total_acc is highly correlated with open_accHigh correlation
loan_amnt is highly correlated with installmentHigh correlation
installment is highly correlated with loan_amntHigh correlation
open_acc is highly correlated with total_accHigh correlation
total_acc is highly correlated with open_accHigh correlation
installment is highly correlated with loan_amntHigh correlation
loan_amnt is highly correlated with installmentHigh correlation
total_acc is highly correlated with open_accHigh correlation
fico_average is highly correlated with gradeHigh correlation
open_acc is highly correlated with total_accHigh correlation
inq_last_6mths is highly correlated with df_indexHigh correlation
grade is highly correlated with fico_averageHigh correlation
loan_status is highly correlated with df_indexHigh correlation
df_index is highly correlated with inq_last_6mths and 1 other fieldsHigh correlation
annual_inc is highly skewed (γ1 = 29.30357761) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
delinq_2yrs has 36619 (88.8%) zeros Zeros
inq_last_6mths has 19037 (46.2%) zeros Zeros
pub_rec has 38988 (94.6%) zeros Zeros
revol_bal has 997 (2.4%) zeros Zeros

Reproduction

Analysis started2021-08-10 18:29:52.627874
Analysis finished2021-08-10 18:31:05.658475
Duration1 minute and 13.03 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct41221
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21312.42469
Minimum0
Maximum42449
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:05.967284image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2116
Q110682
median21366
Q331990
95-th percentile40368
Maximum42449
Range42449
Interquartile range (IQR)21308

Descriptive statistics

Standard deviation12269.89789
Coefficient of variation (CV)0.5757157183
Kurtosis-1.203348579
Mean21312.42469
Median Absolute Deviation (MAD)10655
Skewness-0.01091485817
Sum878519458
Variance150550394.1
MonotonicityStrictly increasing
2021-08-10T13:31:06.233254image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
423981
 
< 0.1%
362191
 
< 0.1%
341701
 
< 0.1%
403131
 
< 0.1%
382641
 
< 0.1%
116311
 
< 0.1%
95821
 
< 0.1%
157251
 
< 0.1%
136761
 
< 0.1%
Other values (41211)41211
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
424491
< 0.1%
424481
< 0.1%
424461
< 0.1%
424451
< 0.1%
424441
< 0.1%
424431
< 0.1%
424421
< 0.1%
424411
< 0.1%
424401
< 0.1%
424391
< 0.1%

loan_amnt
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct893
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11184.61888
Minimum500
Maximum35000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:06.526072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum500
5-th percentile2400
Q15500
median10000
Q315000
95-th percentile25000
Maximum35000
Range34500
Interquartile range (IQR)9500

Descriptive statistics

Standard deviation7417.438565
Coefficient of variation (CV)0.6631820576
Kurtosis0.7585805103
Mean11184.61888
Median Absolute Deviation (MAD)5000
Skewness1.054145051
Sum461041175
Variance55018394.86
MonotonicityNot monotonic
2021-08-10T13:31:06.779916image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100002940
 
7.1%
120002393
 
5.8%
50002150
 
5.2%
60001977
 
4.8%
150001977
 
4.8%
200001696
 
4.1%
80001646
 
4.0%
250001475
 
3.6%
40001185
 
2.9%
30001074
 
2.6%
Other values (883)22708
55.1%
ValueCountFrequency (%)
50011
< 0.1%
5501
 
< 0.1%
6006
< 0.1%
7002
 
< 0.1%
7251
 
< 0.1%
7501
 
< 0.1%
8003
 
< 0.1%
8501
 
< 0.1%
9004
 
< 0.1%
9251
 
< 0.1%
ValueCountFrequency (%)
35000675
1.6%
348002
 
< 0.1%
346751
 
< 0.1%
345251
 
< 0.1%
344755
 
< 0.1%
342001
 
< 0.1%
3400015
 
< 0.1%
339507
 
< 0.1%
336006
 
< 0.1%
335002
 
< 0.1%

term
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
36 months
30501 
60 months
10720 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters412210
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row 36 months
2nd row 60 months
3rd row 36 months
4th row 36 months
5th row 60 months

Common Values

ValueCountFrequency (%)
36 months30501
74.0%
60 months10720
 
26.0%

Length

2021-08-10T13:31:07.292599image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-10T13:31:07.441504image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
months41221
50.0%
3630501
37.0%
6010720
 
13.0%

Most occurring characters

ValueCountFrequency (%)
82442
20.0%
641221
10.0%
m41221
10.0%
o41221
10.0%
n41221
10.0%
t41221
10.0%
h41221
10.0%
s41221
10.0%
330501
 
7.4%
010720
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter247326
60.0%
Space Separator82442
 
20.0%
Decimal Number82442
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m41221
16.7%
o41221
16.7%
n41221
16.7%
t41221
16.7%
h41221
16.7%
s41221
16.7%
Decimal Number
ValueCountFrequency (%)
641221
50.0%
330501
37.0%
010720
 
13.0%
Space Separator
ValueCountFrequency (%)
82442
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin247326
60.0%
Common164884
40.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
m41221
16.7%
o41221
16.7%
n41221
16.7%
t41221
16.7%
h41221
16.7%
s41221
16.7%
Common
ValueCountFrequency (%)
82442
50.0%
641221
25.0%
330501
 
18.5%
010720
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII412210
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
82442
20.0%
641221
10.0%
m41221
10.0%
o41221
10.0%
n41221
10.0%
t41221
10.0%
h41221
10.0%
s41221
10.0%
330501
 
7.4%
010720
 
2.6%

int_rate
Categorical

HIGH CARDINALITY

Distinct394
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
10.99%
 
946
13.49%
 
818
11.49%
 
812
7.51%
 
756
7.88%
 
715
Other values (389)
37174 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters288547
Distinct characters13
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)< 0.1%

Sample

1st row 10.65%
2nd row 15.27%
3rd row 15.96%
4th row 13.49%
5th row 12.69%

Common Values

ValueCountFrequency (%)
10.99%946
 
2.3%
13.49%818
 
2.0%
11.49%812
 
2.0%
7.51%756
 
1.8%
7.88%715
 
1.7%
7.49%633
 
1.5%
11.71%591
 
1.4%
9.99%587
 
1.4%
7.90%559
 
1.4%
5.42%524
 
1.3%
Other values (384)34280
83.2%

Length

2021-08-10T13:31:07.914212image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
10.99946
 
2.3%
13.49818
 
2.0%
11.49812
 
2.0%
7.51756
 
1.8%
7.88715
 
1.7%
7.49633
 
1.5%
11.71591
 
1.4%
9.99587
 
1.4%
7.90559
 
1.4%
5.42524
 
1.3%
Other values (384)34280
83.2%

Most occurring characters

ValueCountFrequency (%)
53007
18.4%
.41221
14.3%
%41221
14.3%
140505
14.0%
921898
7.6%
213314
 
4.6%
612450
 
4.3%
712400
 
4.3%
411683
 
4.0%
310668
 
3.7%
Other values (3)30180
10.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number153098
53.1%
Other Punctuation82442
28.6%
Space Separator53007
 
18.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
140505
26.5%
921898
14.3%
213314
 
8.7%
612450
 
8.1%
712400
 
8.1%
411683
 
7.6%
310668
 
7.0%
510561
 
6.9%
810032
 
6.6%
09587
 
6.3%
Other Punctuation
ValueCountFrequency (%)
.41221
50.0%
%41221
50.0%
Space Separator
ValueCountFrequency (%)
53007
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common288547
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
53007
18.4%
.41221
14.3%
%41221
14.3%
140505
14.0%
921898
7.6%
213314
 
4.6%
612450
 
4.3%
712400
 
4.3%
411683
 
4.0%
310668
 
3.7%
Other values (3)30180
10.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII288547
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
53007
18.4%
.41221
14.3%
%41221
14.3%
140505
14.0%
921898
7.6%
213314
 
4.6%
612450
 
4.3%
712400
 
4.3%
411683
 
4.0%
310668
 
3.7%
Other values (3)30180
10.5%

installment
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct16128
Distinct (%)39.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean325.4442859
Minimum15.67
Maximum1305.19
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:08.143073image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum15.67
5-th percentile71.76
Q1167.57
median280.82
Q3432.26
95-th percentile768.19
Maximum1305.19
Range1289.52
Interquartile range (IQR)264.69

Descriptive statistics

Standard deviation209.2316031
Coefficient of variation (CV)0.6429106675
Kurtosis1.171865356
Mean325.4442859
Median Absolute Deviation (MAD)123.45
Skewness1.114593785
Sum13415138.91
Variance43777.86373
MonotonicityNot monotonic
2021-08-10T13:31:08.403911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
311.1168
 
0.2%
311.0254
 
0.1%
180.9653
 
0.1%
150.846
 
0.1%
368.4545
 
0.1%
372.1244
 
0.1%
317.7242
 
0.1%
339.3142
 
0.1%
330.7642
 
0.1%
186.6141
 
0.1%
Other values (16118)40744
98.8%
ValueCountFrequency (%)
15.671
< 0.1%
15.691
< 0.1%
15.751
< 0.1%
15.761
< 0.1%
15.911
< 0.1%
16.081
< 0.1%
16.251
< 0.1%
16.311
< 0.1%
16.471
< 0.1%
16.731
< 0.1%
ValueCountFrequency (%)
1305.191
 
< 0.1%
1302.691
 
< 0.1%
1295.211
 
< 0.1%
1288.12
< 0.1%
1283.51
 
< 0.1%
1276.63
< 0.1%
1272.21
 
< 0.1%
1269.734
< 0.1%
1265.161
 
< 0.1%
1263.231
 
< 0.1%

grade
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
B
12012 
A
9753 
C
8519 
D
5861 
E
3314 
Other values (2)
1762 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters41221
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowC
3rd rowC
4th rowC
5th rowB

Common Values

ValueCountFrequency (%)
B12012
29.1%
A9753
23.7%
C8519
20.7%
D5861
14.2%
E3314
 
8.0%
F1262
 
3.1%
G500
 
1.2%

Length

2021-08-10T13:31:08.905600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-10T13:31:09.078491image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
b12012
29.1%
a9753
23.7%
c8519
20.7%
d5861
14.2%
e3314
 
8.0%
f1262
 
3.1%
g500
 
1.2%

Most occurring characters

ValueCountFrequency (%)
B12012
29.1%
A9753
23.7%
C8519
20.7%
D5861
14.2%
E3314
 
8.0%
F1262
 
3.1%
G500
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter41221
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B12012
29.1%
A9753
23.7%
C8519
20.7%
D5861
14.2%
E3314
 
8.0%
F1262
 
3.1%
G500
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Latin41221
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B12012
29.1%
A9753
23.7%
C8519
20.7%
D5861
14.2%
E3314
 
8.0%
F1262
 
3.1%
G500
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII41221
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B12012
29.1%
A9753
23.7%
C8519
20.7%
D5861
14.2%
E3314
 
8.0%
F1262
 
3.1%
G500
 
1.2%

emp_length
Categorical

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
10+ years
9356 
< 1 year
4993 
2 years
4719 
3 years
4349 
4 years
3635 
Other values (6)
14169 

Length

Max length9
Median length7
Mean length7.488658693
Min length6

Characters and Unicode

Total characters308690
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row10+ years
2nd row< 1 year
3rd row10+ years
4th row10+ years
5th row1 year

Common Values

ValueCountFrequency (%)
10+ years9356
22.7%
< 1 year4993
12.1%
2 years4719
11.4%
3 years4349
10.6%
4 years3635
 
8.8%
1 year3562
 
8.6%
5 years3449
 
8.4%
6 years2368
 
5.7%
7 years1868
 
4.5%
8 years1586
 
3.8%

Length

2021-08-10T13:31:09.594175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years32666
37.4%
109356
 
10.7%
year8555
 
9.8%
18555
 
9.8%
4993
 
5.7%
24719
 
5.4%
34349
 
5.0%
43635
 
4.2%
53449
 
3.9%
62368
 
2.7%
Other values (3)4790
 
5.5%

Most occurring characters

ValueCountFrequency (%)
46214
15.0%
y41221
13.4%
e41221
13.4%
a41221
13.4%
r41221
13.4%
s32666
10.6%
117911
 
5.8%
09356
 
3.0%
+9356
 
3.0%
<4993
 
1.6%
Other values (8)23310
7.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter197550
64.0%
Decimal Number50577
 
16.4%
Space Separator46214
 
15.0%
Math Symbol14349
 
4.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
117911
35.4%
09356
18.5%
24719
 
9.3%
34349
 
8.6%
43635
 
7.2%
53449
 
6.8%
62368
 
4.7%
71868
 
3.7%
81586
 
3.1%
91336
 
2.6%
Lowercase Letter
ValueCountFrequency (%)
y41221
20.9%
e41221
20.9%
a41221
20.9%
r41221
20.9%
s32666
16.5%
Math Symbol
ValueCountFrequency (%)
+9356
65.2%
<4993
34.8%
Space Separator
ValueCountFrequency (%)
46214
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin197550
64.0%
Common111140
36.0%

Most frequent character per script

Common
ValueCountFrequency (%)
46214
41.6%
117911
 
16.1%
09356
 
8.4%
+9356
 
8.4%
<4993
 
4.5%
24719
 
4.2%
34349
 
3.9%
43635
 
3.3%
53449
 
3.1%
62368
 
2.1%
Other values (3)4790
 
4.3%
Latin
ValueCountFrequency (%)
y41221
20.9%
e41221
20.9%
a41221
20.9%
r41221
20.9%
s32666
16.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII308690
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
46214
15.0%
y41221
13.4%
e41221
13.4%
a41221
13.4%
r41221
13.4%
s32666
10.6%
117911
 
5.8%
09356
 
3.0%
+9356
 
3.0%
<4993
 
1.6%
Other values (8)23310
7.6%

home_ownership
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
RENT
19649 
MORTGAGE
18425 
OWN
3011 
OTHER
 
134
NONE
 
2

Length

Max length8
Median length4
Mean length5.718129109
Min length3

Characters and Unicode

Total characters235707
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRENT
2nd rowRENT
3rd rowRENT
4th rowRENT
5th rowRENT

Common Values

ValueCountFrequency (%)
RENT19649
47.7%
MORTGAGE18425
44.7%
OWN3011
 
7.3%
OTHER134
 
0.3%
NONE2
 
< 0.1%

Length

2021-08-10T13:31:10.063883image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-10T13:31:10.203795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
rent19649
47.7%
mortgage18425
44.7%
own3011
 
7.3%
other134
 
0.3%
none2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
E38210
16.2%
R38208
16.2%
T38208
16.2%
G36850
15.6%
N22664
9.6%
O21572
9.2%
M18425
7.8%
A18425
7.8%
W3011
 
1.3%
H134
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter235707
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E38210
16.2%
R38208
16.2%
T38208
16.2%
G36850
15.6%
N22664
9.6%
O21572
9.2%
M18425
7.8%
A18425
7.8%
W3011
 
1.3%
H134
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin235707
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E38210
16.2%
R38208
16.2%
T38208
16.2%
G36850
15.6%
N22664
9.6%
O21572
9.2%
M18425
7.8%
A18425
7.8%
W3011
 
1.3%
H134
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII235707
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E38210
16.2%
R38208
16.2%
T38208
16.2%
G36850
15.6%
N22664
9.6%
O21572
9.2%
M18425
7.8%
A18425
7.8%
W3011
 
1.3%
H134
 
0.1%

annual_inc
Real number (ℝ≥0)

SKEWED

Distinct5365
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69771.55031
Minimum1896
Maximum6000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:10.435653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1896
5-th percentile24000
Q141000
median60000
Q383400
95-th percentile145000
Maximum6000000
Range5998104
Interquartile range (IQR)42400

Descriptive statistics

Standard deviation64520.72619
Coefficient of variation (CV)0.9247426194
Kurtosis2126.31231
Mean69771.55031
Median Absolute Deviation (MAD)20000
Skewness29.30357761
Sum2876053075
Variance4162924108
MonotonicityNot monotonic
2021-08-10T13:31:10.691496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
600001552
 
3.8%
500001090
 
2.6%
40000911
 
2.2%
45000874
 
2.1%
75000854
 
2.1%
30000846
 
2.1%
65000826
 
2.0%
70000775
 
1.9%
48000738
 
1.8%
80000706
 
1.7%
Other values (5355)32049
77.7%
ValueCountFrequency (%)
18961
 
< 0.1%
20001
 
< 0.1%
33001
 
< 0.1%
35001
 
< 0.1%
36001
 
< 0.1%
40001
 
< 0.1%
40801
 
< 0.1%
45001
 
< 0.1%
48002
< 0.1%
50003
< 0.1%
ValueCountFrequency (%)
60000001
 
< 0.1%
39000001
 
< 0.1%
20397841
 
< 0.1%
19000001
 
< 0.1%
17820001
 
< 0.1%
14400002
< 0.1%
13620001
 
< 0.1%
12500001
 
< 0.1%
12000004
< 0.1%
11760001
 
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
Not Verified
18132 
Verified
12995 
Source Verified
10094 

Length

Max length15
Median length12
Mean length11.47361782
Min length8

Characters and Unicode

Total characters472954
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVerified
2nd rowSource Verified
3rd rowNot Verified
4th rowSource Verified
5th rowSource Verified

Common Values

ValueCountFrequency (%)
Not Verified18132
44.0%
Verified12995
31.5%
Source Verified10094
24.5%

Length

2021-08-10T13:31:11.216169image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-10T13:31:11.379068image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
verified41221
59.4%
not18132
26.1%
source10094
 
14.5%

Most occurring characters

ValueCountFrequency (%)
e92536
19.6%
i82442
17.4%
r51315
10.8%
V41221
8.7%
f41221
8.7%
d41221
8.7%
o28226
 
6.0%
28226
 
6.0%
N18132
 
3.8%
t18132
 
3.8%
Other values (3)30282
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter375281
79.3%
Uppercase Letter69447
 
14.7%
Space Separator28226
 
6.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e92536
24.7%
i82442
22.0%
r51315
13.7%
f41221
11.0%
d41221
11.0%
o28226
 
7.5%
t18132
 
4.8%
u10094
 
2.7%
c10094
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
V41221
59.4%
N18132
26.1%
S10094
 
14.5%
Space Separator
ValueCountFrequency (%)
28226
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin444728
94.0%
Common28226
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e92536
20.8%
i82442
18.5%
r51315
11.5%
V41221
9.3%
f41221
9.3%
d41221
9.3%
o28226
 
6.3%
N18132
 
4.1%
t18132
 
4.1%
S10094
 
2.3%
Other values (2)20188
 
4.5%
Common
ValueCountFrequency (%)
28226
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII472954
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e92536
19.6%
i82442
17.4%
r51315
10.8%
V41221
8.7%
f41221
8.7%
d41221
8.7%
o28226
 
6.0%
28226
 
6.0%
N18132
 
3.8%
t18132
 
3.8%
Other values (3)30282
 
6.4%

loan_status
Categorical

HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
Fully Paid
32676 
Charged Off
5401 
Does not meet the credit policy. Status:Fully Paid
 
1895
Does not meet the credit policy. Status:Charged Off
 
723
Current
 
494
Other values (4)
 
32

Length

Max length51
Median length10
Mean length12.65782004
Min length7

Characters and Unicode

Total characters521768
Distinct characters38
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowFully Paid
2nd rowCharged Off
3rd rowFully Paid
4th rowFully Paid
5th rowCurrent

Common Values

ValueCountFrequency (%)
Fully Paid32676
79.3%
Charged Off5401
 
13.1%
Does not meet the credit policy. Status:Fully Paid1895
 
4.6%
Does not meet the credit policy. Status:Charged Off723
 
1.8%
Current494
 
1.2%
In Grace Period15
 
< 0.1%
Late (31-120 days)12
 
< 0.1%
Late (16-30 days)4
 
< 0.1%
Default1
 
< 0.1%

Length

2021-08-10T13:31:11.810802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-10T13:31:11.980695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
paid34571
35.4%
fully32676
33.5%
off6124
 
6.3%
charged5401
 
5.5%
the2618
 
2.7%
credit2618
 
2.7%
policy2618
 
2.7%
does2618
 
2.7%
not2618
 
2.7%
meet2618
 
2.7%
Other values (11)3206
 
3.3%

Most occurring characters

ValueCountFrequency (%)
l71761
13.8%
56465
10.8%
a43361
8.3%
d43344
8.3%
i39822
 
7.6%
u37684
 
7.2%
y37205
 
7.1%
P34586
 
6.6%
F34571
 
6.6%
e19755
 
3.8%
Other values (28)103214
19.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter372761
71.4%
Uppercase Letter87182
 
16.7%
Space Separator56465
 
10.8%
Other Punctuation5236
 
1.0%
Decimal Number76
 
< 0.1%
Open Punctuation16
 
< 0.1%
Dash Punctuation16
 
< 0.1%
Close Punctuation16
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l71761
19.3%
a43361
11.6%
d43344
11.6%
i39822
10.7%
u37684
10.1%
y37205
10.0%
e19755
 
5.3%
t16219
 
4.4%
f12249
 
3.3%
r9760
 
2.6%
Other values (8)41601
11.2%
Uppercase Letter
ValueCountFrequency (%)
P34586
39.7%
F34571
39.7%
C6618
 
7.6%
O6124
 
7.0%
D2619
 
3.0%
S2618
 
3.0%
L16
 
< 0.1%
I15
 
< 0.1%
G15
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
128
36.8%
316
21.1%
016
21.1%
212
15.8%
64
 
5.3%
Other Punctuation
ValueCountFrequency (%)
.2618
50.0%
:2618
50.0%
Space Separator
ValueCountFrequency (%)
56465
100.0%
Open Punctuation
ValueCountFrequency (%)
(16
100.0%
Dash Punctuation
ValueCountFrequency (%)
-16
100.0%
Close Punctuation
ValueCountFrequency (%)
)16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin459943
88.2%
Common61825
 
11.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
l71761
15.6%
a43361
9.4%
d43344
9.4%
i39822
8.7%
u37684
8.2%
y37205
8.1%
P34586
7.5%
F34571
7.5%
e19755
 
4.3%
t16219
 
3.5%
Other values (17)81635
17.7%
Common
ValueCountFrequency (%)
56465
91.3%
.2618
 
4.2%
:2618
 
4.2%
128
 
< 0.1%
(16
 
< 0.1%
316
 
< 0.1%
-16
 
< 0.1%
016
 
< 0.1%
)16
 
< 0.1%
212
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII521768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l71761
13.8%
56465
10.8%
a43361
8.3%
d43344
8.3%
i39822
 
7.6%
u37684
 
7.2%
y37205
 
7.1%
P34586
 
6.6%
F34571
 
6.6%
e19755
 
3.8%
Other values (28)103214
19.8%

purpose
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
debt_consolidation
19314 
credit_card
5309 
other
4205 
home_improvement
3086 
major_purchase
2231 
Other values (9)
7076 

Length

Max length18
Median length16
Mean length13.73071978
Min length3

Characters and Unicode

Total characters565994
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcredit_card
2nd rowcar
3rd rowsmall_business
4th rowother
5th rowother

Common Values

ValueCountFrequency (%)
debt_consolidation19314
46.9%
credit_card5309
 
12.9%
other4205
 
10.2%
home_improvement3086
 
7.5%
major_purchase2231
 
5.4%
small_business1938
 
4.7%
car1555
 
3.8%
wedding989
 
2.4%
medical724
 
1.8%
moving597
 
1.4%
Other values (4)1273
 
3.1%

Length

2021-08-10T13:31:12.611307image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
debt_consolidation19314
46.9%
credit_card5309
 
12.9%
other4205
 
10.2%
home_improvement3086
 
7.5%
major_purchase2231
 
5.4%
small_business1938
 
4.7%
car1555
 
3.8%
wedding989
 
2.4%
medical724
 
1.8%
moving597
 
1.4%
Other values (4)1273
 
3.1%

Most occurring characters

ValueCountFrequency (%)
o72322
12.8%
d52348
9.2%
i52036
9.2%
t51993
9.2%
n46199
8.2%
e45268
 
8.0%
c35207
 
6.2%
a34930
 
6.2%
_31976
 
5.6%
s29707
 
5.2%
Other values (12)114008
20.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter534018
94.4%
Connector Punctuation31976
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o72322
13.5%
d52348
9.8%
i52036
9.7%
t51993
9.7%
n46199
8.7%
e45268
8.5%
c35207
 
6.6%
a34930
 
6.5%
s29707
 
5.6%
l24412
 
4.6%
Other values (11)89596
16.8%
Connector Punctuation
ValueCountFrequency (%)
_31976
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin534018
94.4%
Common31976
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
o72322
13.5%
d52348
9.8%
i52036
9.7%
t51993
9.7%
n46199
8.7%
e45268
8.5%
c35207
 
6.6%
a34930
 
6.5%
s29707
 
5.6%
l24412
 
4.6%
Other values (11)89596
16.8%
Common
ValueCountFrequency (%)
_31976
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII565994
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o72322
12.8%
d52348
9.2%
i52036
9.2%
t51993
9.2%
n46199
8.2%
e45268
 
8.0%
c35207
 
6.2%
a34930
 
6.2%
_31976
 
5.6%
s29707
 
5.2%
Other values (12)114008
20.1%

addr_state
Categorical

Distinct50
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
CA
7228 
NY
3933 
FL
2985 
TX
2850 
NJ
 
1949
Other values (45)
22276 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters82442
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAZ
2nd rowGA
3rd rowIL
4th rowCA
5th rowOR

Common Values

ValueCountFrequency (%)
CA7228
17.5%
NY3933
 
9.5%
FL2985
 
7.2%
TX2850
 
6.9%
NJ1949
 
4.7%
IL1631
 
4.0%
PA1610
 
3.9%
VA1452
 
3.5%
GA1451
 
3.5%
MA1385
 
3.4%
Other values (40)14747
35.8%

Length

2021-08-10T13:31:13.174331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca7228
17.5%
ny3933
 
9.5%
fl2985
 
7.2%
tx2850
 
6.9%
nj1949
 
4.7%
il1631
 
4.0%
pa1610
 
3.9%
va1452
 
3.5%
ga1451
 
3.5%
ma1385
 
3.4%
Other values (40)14747
35.8%

Most occurring characters

ValueCountFrequency (%)
A16118
19.6%
C10344
12.5%
N8245
10.0%
L5530
 
6.7%
M4925
 
6.0%
Y4368
 
5.3%
T4087
 
5.0%
O3620
 
4.4%
I3299
 
4.0%
F2985
 
3.6%
Other values (14)18921
23.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter82442
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A16118
19.6%
C10344
12.5%
N8245
10.0%
L5530
 
6.7%
M4925
 
6.0%
Y4368
 
5.3%
T4087
 
5.0%
O3620
 
4.4%
I3299
 
4.0%
F2985
 
3.6%
Other values (14)18921
23.0%

Most occurring scripts

ValueCountFrequency (%)
Latin82442
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A16118
19.6%
C10344
12.5%
N8245
10.0%
L5530
 
6.7%
M4925
 
6.0%
Y4368
 
5.3%
T4087
 
5.0%
O3620
 
4.4%
I3299
 
4.0%
F2985
 
3.6%
Other values (14)18921
23.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII82442
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A16118
19.6%
C10344
12.5%
N8245
10.0%
L5530
 
6.7%
M4925
 
6.0%
Y4368
 
5.3%
T4087
 
5.0%
O3620
 
4.4%
I3299
 
4.0%
F2985
 
3.6%
Other values (14)18921
23.0%

dti
Real number (ℝ≥0)

Distinct2891
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.40131972
Minimum0
Maximum29.99
Zeros187
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:13.406185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.14
Q18.24
median13.5
Q318.69
95-th percentile23.92
Maximum29.99
Range29.99
Interquartile range (IQR)10.45

Descriptive statistics

Standard deviation6.713904199
Coefficient of variation (CV)0.5009882863
Kurtosis-0.8488100863
Mean13.40131972
Median Absolute Deviation (MAD)5.22
Skewness-0.03250169309
Sum552415.8
Variance45.07650959
MonotonicityNot monotonic
2021-08-10T13:31:13.650035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0187
 
0.5%
1251
 
0.1%
1845
 
0.1%
19.245
 
0.1%
13.242
 
0.1%
16.841
 
0.1%
12.4840
 
0.1%
14.2937
 
0.1%
13.537
 
0.1%
4.836
 
0.1%
Other values (2881)40660
98.6%
ValueCountFrequency (%)
0187
0.5%
0.013
 
< 0.1%
0.025
 
< 0.1%
0.032
 
< 0.1%
0.043
 
< 0.1%
0.051
 
< 0.1%
0.061
 
< 0.1%
0.075
 
< 0.1%
0.085
 
< 0.1%
0.094
 
< 0.1%
ValueCountFrequency (%)
29.991
 
< 0.1%
29.961
 
< 0.1%
29.952
< 0.1%
29.933
< 0.1%
29.922
< 0.1%
29.91
 
< 0.1%
29.891
 
< 0.1%
29.881
 
< 0.1%
29.861
 
< 0.1%
29.851
 
< 0.1%

delinq_2yrs
Real number (ℝ≥0)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1523980495
Minimum0
Maximum11
Zeros36619
Zeros (%)88.8%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:14.186704image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum11
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5090790329
Coefficient of variation (CV)3.340456354
Kurtosis43.60868145
Mean0.1523980495
Median Absolute Deviation (MAD)0
Skewness5.196366964
Sum6282
Variance0.2591614617
MonotonicityNot monotonic
2021-08-10T13:31:14.369592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
036619
88.8%
13499
 
8.5%
2746
 
1.8%
3238
 
0.6%
468
 
0.2%
526
 
0.1%
613
 
< 0.1%
76
 
< 0.1%
83
 
< 0.1%
112
 
< 0.1%
ValueCountFrequency (%)
036619
88.8%
13499
 
8.5%
2746
 
1.8%
3238
 
0.6%
468
 
0.2%
526
 
0.1%
613
 
< 0.1%
76
 
< 0.1%
83
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
112
 
< 0.1%
91
 
< 0.1%
83
 
< 0.1%
76
 
< 0.1%
613
 
< 0.1%
526
 
0.1%
468
 
0.2%
3238
 
0.6%
2746
 
1.8%
13499
8.5%

earliest_cr_line
Categorical

HIGH CARDINALITY

Distinct519
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
Oct-1999
 
387
Nov-1998
 
385
Oct-2000
 
359
Dec-1998
 
358
Nov-2000
 
337
Other values (514)
39395 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters329768
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)0.1%

Sample

1st rowJan-1985
2nd rowApr-1999
3rd rowNov-2001
4th rowFeb-1996
5th rowJan-1996

Common Values

ValueCountFrequency (%)
Oct-1999387
 
0.9%
Nov-1998385
 
0.9%
Oct-2000359
 
0.9%
Dec-1998358
 
0.9%
Nov-2000337
 
0.8%
Dec-1997337
 
0.8%
Nov-1999329
 
0.8%
Oct-1998323
 
0.8%
Sep-2000317
 
0.8%
Nov-1997316
 
0.8%
Other values (509)37773
91.6%

Length

2021-08-10T13:31:14.907257image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
oct-1999387
 
0.9%
nov-1998385
 
0.9%
oct-2000359
 
0.9%
dec-1998358
 
0.9%
nov-2000337
 
0.8%
dec-1997337
 
0.8%
nov-1999329
 
0.8%
oct-1998323
 
0.8%
sep-2000317
 
0.8%
nov-1997316
 
0.8%
Other values (509)37773
91.6%

Most occurring characters

ValueCountFrequency (%)
949971
15.2%
-41221
 
12.5%
035789
 
10.9%
129519
 
9.0%
218976
 
5.8%
e10881
 
3.3%
J9805
 
3.0%
u9667
 
2.9%
a9451
 
2.9%
88664
 
2.6%
Other values (23)105824
32.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number164884
50.0%
Lowercase Letter82442
25.0%
Uppercase Letter41221
 
12.5%
Dash Punctuation41221
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e10881
13.2%
u9667
11.7%
a9451
11.5%
c8448
10.2%
n6627
8.0%
p6584
8.0%
r5730
7.0%
t4268
 
5.2%
o4090
 
5.0%
v4090
 
5.0%
Other values (4)12606
15.3%
Decimal Number
ValueCountFrequency (%)
949971
30.3%
035789
21.7%
129519
17.9%
218976
 
11.5%
88664
 
5.3%
74892
 
3.0%
44432
 
2.7%
54376
 
2.7%
64371
 
2.7%
33894
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
J9805
23.8%
A6301
15.3%
M5876
14.3%
O4268
10.4%
D4180
10.1%
N4090
9.9%
S3720
 
9.0%
F2981
 
7.2%
Dash Punctuation
ValueCountFrequency (%)
-41221
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common206105
62.5%
Latin123663
37.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e10881
 
8.8%
J9805
 
7.9%
u9667
 
7.8%
a9451
 
7.6%
c8448
 
6.8%
n6627
 
5.4%
p6584
 
5.3%
A6301
 
5.1%
M5876
 
4.8%
r5730
 
4.6%
Other values (12)44293
35.8%
Common
ValueCountFrequency (%)
949971
24.2%
-41221
20.0%
035789
17.4%
129519
14.3%
218976
 
9.2%
88664
 
4.2%
74892
 
2.4%
44432
 
2.2%
54376
 
2.1%
64371
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII329768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
949971
15.2%
-41221
 
12.5%
035789
 
10.9%
129519
 
9.0%
218976
 
5.8%
e10881
 
3.3%
J9805
 
3.0%
u9667
 
2.9%
a9451
 
2.9%
88664
 
2.6%
Other values (23)105824
32.1%

inq_last_6mths
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct28
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.081317775
Minimum0
Maximum33
Zeros19037
Zeros (%)46.2%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:15.121127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile4
Maximum33
Range33
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.521441632
Coefficient of variation (CV)1.407025453
Kurtosis31.41822687
Mean1.081317775
Median Absolute Deviation (MAD)1
Skewness3.446584413
Sum44573
Variance2.314784639
MonotonicityNot monotonic
2021-08-10T13:31:15.329996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
019037
46.2%
110909
26.5%
25824
 
14.1%
33094
 
7.5%
41025
 
2.5%
5588
 
1.4%
6325
 
0.8%
7173
 
0.4%
8112
 
0.3%
946
 
0.1%
Other values (18)88
 
0.2%
ValueCountFrequency (%)
019037
46.2%
110909
26.5%
25824
 
14.1%
33094
 
7.5%
41025
 
2.5%
5588
 
1.4%
6325
 
0.8%
7173
 
0.4%
8112
 
0.3%
946
 
0.1%
ValueCountFrequency (%)
331
 
< 0.1%
321
 
< 0.1%
311
 
< 0.1%
281
 
< 0.1%
271
 
< 0.1%
251
 
< 0.1%
242
< 0.1%
201
 
< 0.1%
192
< 0.1%
183
< 0.1%

open_acc
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct44
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.37323209
Minimum1
Maximum47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:15.566851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q16
median9
Q312
95-th percentile18
Maximum47
Range46
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.487523337
Coefficient of variation (CV)0.478759439
Kurtosis1.96594772
Mean9.37323209
Median Absolute Deviation (MAD)3
Skewness1.045304521
Sum386374
Variance20.1378657
MonotonicityNot monotonic
2021-08-10T13:31:15.808701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
74146
10.1%
84063
9.9%
64035
9.8%
93823
9.3%
103301
 
8.0%
53231
 
7.8%
112877
 
7.0%
42405
 
5.8%
122350
 
5.7%
132005
 
4.9%
Other values (34)8985
21.8%
ValueCountFrequency (%)
134
 
0.1%
2629
 
1.5%
31521
 
3.7%
42405
5.8%
53231
7.8%
64035
9.8%
74146
10.1%
84063
9.9%
93823
9.3%
103301
8.0%
ValueCountFrequency (%)
471
 
< 0.1%
461
 
< 0.1%
441
 
< 0.1%
421
 
< 0.1%
411
 
< 0.1%
391
 
< 0.1%
382
< 0.1%
371
 
< 0.1%
362
< 0.1%
354
< 0.1%

pub_rec
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05645180854
Minimum0
Maximum5
Zeros38988
Zeros (%)94.6%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:16.018572image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2427821039
Coefficient of variation (CV)4.30069665
Kurtosis28.43678714
Mean0.05645180854
Median Absolute Deviation (MAD)0
Skewness4.720084777
Sum2327
Variance0.05894314996
MonotonicityNot monotonic
2021-08-10T13:31:16.215450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
038988
94.6%
12157
 
5.2%
262
 
0.2%
311
 
< 0.1%
42
 
< 0.1%
51
 
< 0.1%
ValueCountFrequency (%)
038988
94.6%
12157
 
5.2%
262
 
0.2%
311
 
< 0.1%
42
 
< 0.1%
51
 
< 0.1%
ValueCountFrequency (%)
51
 
< 0.1%
42
 
< 0.1%
311
 
< 0.1%
262
 
0.2%
12157
 
5.2%
038988
94.6%

revol_bal
Real number (ℝ≥0)

ZEROS

Distinct22409
Distinct (%)54.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14404.05458
Minimum0
Maximum1207359
Zeros997
Zeros (%)2.4%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:16.448305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile324
Q13712
median8943
Q317392
95-th percentile44711
Maximum1207359
Range1207359
Interquartile range (IQR)13680

Descriptive statistics

Standard deviation22088.47451
Coefficient of variation (CV)1.533490059
Kurtosis351.6086808
Mean14404.05458
Median Absolute Deviation (MAD)6129
Skewness11.12198398
Sum593749534
Variance487900706.1
MonotonicityNot monotonic
2021-08-10T13:31:16.736127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0997
 
2.4%
25514
 
< 0.1%
29814
 
< 0.1%
112
 
< 0.1%
68211
 
< 0.1%
40010
 
< 0.1%
3910
 
< 0.1%
17639
 
< 0.1%
1829
 
< 0.1%
11599
 
< 0.1%
Other values (22399)40126
97.3%
ValueCountFrequency (%)
0997
2.4%
112
 
< 0.1%
26
 
< 0.1%
37
 
< 0.1%
43
 
< 0.1%
57
 
< 0.1%
69
 
< 0.1%
75
 
< 0.1%
85
 
< 0.1%
97
 
< 0.1%
ValueCountFrequency (%)
12073591
< 0.1%
9520131
< 0.1%
6025191
< 0.1%
5089611
< 0.1%
4875891
< 0.1%
4657311
< 0.1%
4231891
< 0.1%
4077941
< 0.1%
4019411
< 0.1%
3941071
< 0.1%

revol_util
Categorical

HIGH CARDINALITY

Distinct1116
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
0%
 
1031
0.2%
 
63
40.7%
 
63
63%
 
62
66.6%
 
61
Other values (1111)
39941 

Length

Max length6
Median length5
Mean length4.647970695
Min length2

Characters and Unicode

Total characters191594
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique112 ?
Unique (%)0.3%

Sample

1st row83.7%
2nd row9.4%
3rd row98.5%
4th row21%
5th row53.9%

Common Values

ValueCountFrequency (%)
0%1031
 
2.5%
0.2%63
 
0.2%
40.7%63
 
0.2%
63%62
 
0.2%
66.6%61
 
0.1%
70.4%60
 
0.1%
0.1%59
 
0.1%
37.6%59
 
0.1%
78.7%58
 
0.1%
66.7%58
 
0.1%
Other values (1106)39647
96.2%

Length

2021-08-10T13:31:17.300778image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
01031
 
2.5%
0.263
 
0.2%
40.763
 
0.2%
6362
 
0.2%
66.661
 
0.1%
70.460
 
0.1%
0.159
 
0.1%
37.659
 
0.1%
66.758
 
0.1%
78.758
 
0.1%
Other values (1106)39647
96.2%

Most occurring characters

ValueCountFrequency (%)
%41221
21.5%
.36179
18.9%
412533
 
6.5%
512529
 
6.5%
712496
 
6.5%
612484
 
6.5%
312300
 
6.4%
811983
 
6.3%
211955
 
6.2%
111465
 
6.0%
Other values (2)16449
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number114194
59.6%
Other Punctuation77400
40.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
412533
11.0%
512529
11.0%
712496
10.9%
612484
10.9%
312300
10.8%
811983
10.5%
211955
10.5%
111465
10.0%
911282
9.9%
05167
4.5%
Other Punctuation
ValueCountFrequency (%)
%41221
53.3%
.36179
46.7%

Most occurring scripts

ValueCountFrequency (%)
Common191594
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
%41221
21.5%
.36179
18.9%
412533
 
6.5%
512529
 
6.5%
712496
 
6.5%
612484
 
6.5%
312300
 
6.4%
811983
 
6.3%
211955
 
6.2%
111465
 
6.0%
Other values (2)16449
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII191594
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
%41221
21.5%
.36179
18.9%
412533
 
6.5%
512529
 
6.5%
712496
 
6.5%
612484
 
6.5%
312300
 
6.4%
811983
 
6.3%
211955
 
6.2%
111465
 
6.0%
Other values (2)16449
 
8.6%

total_acc
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct83
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.18177628
Minimum1
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:17.557619image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7
Q113
median20
Q329
95-th percentile44
Maximum90
Range89
Interquartile range (IQR)16

Descriptive statistics

Standard deviation11.57319276
Coefficient of variation (CV)0.5217432821
Kurtosis0.6546840675
Mean22.18177628
Median Absolute Deviation (MAD)8
Skewness0.8198893335
Sum914355
Variance133.9387906
MonotonicityNot monotonic
2021-08-10T13:31:17.798470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
151510
 
3.7%
161505
 
3.7%
171496
 
3.6%
141490
 
3.6%
201468
 
3.6%
181455
 
3.5%
211433
 
3.5%
131431
 
3.5%
191373
 
3.3%
121372
 
3.3%
Other values (73)26688
64.7%
ValueCountFrequency (%)
120
 
< 0.1%
236
 
0.1%
3217
 
0.5%
4457
 
1.1%
5591
1.4%
6719
1.7%
7863
2.1%
81036
2.5%
91099
2.7%
101213
2.9%
ValueCountFrequency (%)
901
< 0.1%
871
< 0.1%
811
< 0.1%
801
< 0.1%
792
< 0.1%
781
< 0.1%
771
< 0.1%
761
< 0.1%
752
< 0.1%
741
< 0.1%

last_credit_pull_d
Categorical

HIGH CARDINALITY

Distinct108
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size322.2 KiB
Sep-2016
15784 
Mar-2016
 
836
Aug-2016
 
748
Feb-2013
 
688
Apr-2016
 
685
Other values (103)
22480 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters329768
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowSep-2016
2nd rowSep-2016
3rd rowSep-2016
4th rowApr-2016
5th rowSep-2016

Common Values

ValueCountFrequency (%)
Sep-201615784
38.3%
Mar-2016836
 
2.0%
Aug-2016748
 
1.8%
Feb-2013688
 
1.7%
Apr-2016685
 
1.7%
Jul-2016603
 
1.5%
Feb-2016594
 
1.4%
Jun-2016521
 
1.3%
Jan-2016514
 
1.2%
May-2016499
 
1.2%
Other values (98)19749
47.9%

Length

2021-08-10T13:31:18.355125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sep-201615784
38.3%
mar-2016836
 
2.0%
aug-2016748
 
1.8%
feb-2013688
 
1.7%
apr-2016685
 
1.7%
jul-2016603
 
1.5%
feb-2016594
 
1.4%
jun-2016521
 
1.3%
jan-2016514
 
1.2%
may-2016499
 
1.2%
Other values (98)19749
47.9%

Most occurring characters

ValueCountFrequency (%)
244554
13.5%
142943
13.0%
042422
12.9%
-41221
12.5%
e22111
 
6.7%
620784
 
6.3%
p19803
 
6.0%
S17574
 
5.3%
u6780
 
2.1%
a6558
 
2.0%
Other values (23)65018
19.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number164884
50.0%
Lowercase Letter82442
25.0%
Uppercase Letter41221
 
12.5%
Dash Punctuation41221
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e22111
26.8%
p19803
24.0%
u6780
 
8.2%
a6558
 
8.0%
r4949
 
6.0%
n3729
 
4.5%
c3694
 
4.5%
b2589
 
3.1%
g2462
 
3.0%
l2326
 
2.8%
Other values (4)7441
 
9.0%
Decimal Number
ValueCountFrequency (%)
244554
27.0%
142943
26.0%
042422
25.7%
620784
12.6%
45144
 
3.1%
54521
 
2.7%
34166
 
2.5%
9264
 
0.2%
859
 
< 0.1%
727
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
S17574
42.6%
J6055
 
14.7%
M4821
 
11.7%
A4691
 
11.4%
F2589
 
6.3%
D1948
 
4.7%
N1797
 
4.4%
O1746
 
4.2%
Dash Punctuation
ValueCountFrequency (%)
-41221
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common206105
62.5%
Latin123663
37.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e22111
17.9%
p19803
16.0%
S17574
14.2%
u6780
 
5.5%
a6558
 
5.3%
J6055
 
4.9%
r4949
 
4.0%
M4821
 
3.9%
A4691
 
3.8%
n3729
 
3.0%
Other values (12)26592
21.5%
Common
ValueCountFrequency (%)
244554
21.6%
142943
20.8%
042422
20.6%
-41221
20.0%
620784
10.1%
45144
 
2.5%
54521
 
2.2%
34166
 
2.0%
9264
 
0.1%
859
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII329768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
244554
13.5%
142943
13.0%
042422
12.9%
-41221
12.5%
e22111
 
6.7%
620784
 
6.3%
p19803
 
6.0%
S17574
 
5.3%
u6780
 
2.1%
a6558
 
2.0%
Other values (23)65018
19.7%

fico_average
Real number (ℝ≥0)

HIGH CORRELATION

Distinct44
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean714.7798937
Minimum612
Maximum827
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size322.2 KiB
2021-08-10T13:31:18.582984image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum612
5-th percentile667
Q1687
median712
Q3742
95-th percentile782
Maximum827
Range215
Interquartile range (IQR)55

Descriptive statistics

Standard deviation35.97754636
Coefficient of variation (CV)0.05033374144
Kurtosis-0.49297322
Mean714.7798937
Median Absolute Deviation (MAD)25
Skewness0.4671441834
Sum29463942
Variance1294.383842
MonotonicityNot monotonic
2021-08-10T13:31:18.822834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
6872246
 
5.4%
7022208
 
5.4%
6822162
 
5.2%
6972151
 
5.2%
6922136
 
5.2%
6771951
 
4.7%
7071902
 
4.6%
7221888
 
4.6%
7271842
 
4.5%
7171842
 
4.5%
Other values (34)20893
50.7%
ValueCountFrequency (%)
6121
 
< 0.1%
6171
 
< 0.1%
6221
 
< 0.1%
6271
 
< 0.1%
6324
 
< 0.1%
6373
 
< 0.1%
64294
0.2%
647105
0.3%
652125
0.3%
657124
0.3%
ValueCountFrequency (%)
8272
 
< 0.1%
82217
 
< 0.1%
81724
 
0.1%
812114
 
0.3%
807177
 
0.4%
802232
0.6%
797321
0.8%
792403
1.0%
787377
0.9%
782539
1.3%

Interactions

2021-08-10T13:30:22.450022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:22.968868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:23.232704image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:23.482550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:23.730397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:23.972247image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:24.243079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:24.609851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:24.876690image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:25.278436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:25.558265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:25.798116image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:26.019977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:26.304803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:26.635597image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:26.981383image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:27.382135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:27.775893image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:28.101692image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:28.500445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:28.886209image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:29.249980image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:29.981547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:30.454235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:30.928941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:31.304714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:31.640503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:32.020266image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:32.450001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:32.703983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:33.009796image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:33.352583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:33.713360image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:33.969202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:34.231041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:34.606807image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:34.847827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:35.252577image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:35.534402image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:35.799241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:36.066075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:36.325915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:36.578758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:36.844594image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:37.087443image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:37.344282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:37.614115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:37.862963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:38.120802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:38.431610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:38.707441image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:38.977274image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:39.244108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:39.500950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:39.889708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:40.156542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:40.410386image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:40.656235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:40.922070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:41.163918image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:41.400774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:41.637628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:41.901462image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:42.148312image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:42.403152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:42.643004image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:42.883856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:43.134701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:43.369554image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:43.609407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:43.874242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:44.105100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:44.332959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:44.578806image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:44.850639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:45.111477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:45.390306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:45.643147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:45.910981image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:46.181816image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:46.442656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:46.701495image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:46.969327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:47.215015image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:47.455866image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:47.681724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:47.934570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:48.173419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:48.419267image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:48.655124image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:48.885892image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:49.125744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:49.347606image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:49.577462image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:49.997205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:50.224062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:50.445928image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:50.684779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:50.944618image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:51.191365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:51.441209image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:51.681061image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:51.919913image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:52.169759image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:52.400616image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:52.647461image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:52.905258image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:53.135118image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:53.364977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:53.618819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:53.908640image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:54.193463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:54.462297image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:54.719136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:54.978977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:55.243811image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:55.494655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:55.748501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:56.020333image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:56.270178image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:56.518023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:56.744884image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:56.995727image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:57.233582image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:57.476429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:57.708287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:57.937146image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:58.176000image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:58.395861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:58.627719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:58.878564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:59.109420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:59.330282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:59.553145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:30:59.798994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:00.043841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:00.278696image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:00.501560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:00.741410image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:01.010245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:01.243099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:01.474958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:01.725801image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-10T13:31:01.986640image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-08-10T13:31:19.126140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-10T13:31:19.616646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-10T13:31:20.035385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-10T13:31:20.478113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-08-10T13:31:20.991795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-08-10T13:31:02.773191image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-10T13:31:04.710417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexloan_amnttermint_rateinstallmentgradeemp_lengthhome_ownershipannual_incverification_statusloan_statuspurposeaddr_statedtidelinq_2yrsearliest_cr_lineinq_last_6mthsopen_accpub_recrevol_balrevol_utiltotal_acclast_credit_pull_dfico_average
005000.036 months10.65%162.87B10+ yearsRENT24000.0VerifiedFully Paidcredit_cardAZ27.650.0Jan-19851.03.00.013648.083.7%9.0Sep-2016737.0
112500.060 months15.27%59.83C< 1 yearRENT30000.0Source VerifiedCharged OffcarGA1.000.0Apr-19995.03.00.01687.09.4%4.0Sep-2016742.0
222400.036 months15.96%84.33C10+ yearsRENT12252.0Not VerifiedFully Paidsmall_businessIL8.720.0Nov-20012.02.00.02956.098.5%10.0Sep-2016737.0
3310000.036 months13.49%339.31C10+ yearsRENT49200.0Source VerifiedFully PaidotherCA20.000.0Feb-19961.010.00.05598.021%37.0Apr-2016692.0
443000.060 months12.69%67.79B1 yearRENT80000.0Source VerifiedCurrentotherOR17.940.0Jan-19960.015.00.027783.053.9%38.0Sep-2016697.0
555000.036 months7.90%156.46A3 yearsRENT36000.0Source VerifiedFully PaidweddingAZ11.200.0Nov-20043.09.00.07963.028.3%12.0Jan-2016732.0
667000.060 months15.96%170.08C8 yearsRENT47004.0Not VerifiedFully Paiddebt_consolidationNC23.510.0Jul-20051.07.00.017726.085.6%11.0Sep-2016692.0
773000.036 months18.64%109.43E9 yearsRENT48000.0Source VerifiedFully PaidcarCA5.350.0Jan-20072.04.00.08221.087.5%4.0Dec-2014662.0
885600.060 months21.28%152.39F4 yearsOWN40000.0Source VerifiedCharged Offsmall_businessCA5.550.0Apr-20042.011.00.05210.032.6%13.0Sep-2016677.0
995375.060 months12.69%121.45B< 1 yearRENT15000.0VerifiedCharged OffotherTX18.080.0Sep-20040.02.00.09279.036.5%3.0Sep-2016727.0

Last rows

df_indexloan_amnttermint_rateinstallmentgradeemp_lengthhome_ownershipannual_incverification_statusloan_statuspurposeaddr_statedtidelinq_2yrsearliest_cr_lineinq_last_6mthsopen_accpub_recrevol_balrevol_utiltotal_acclast_credit_pull_dfico_average
412114243911625.036 months15.01%403.07F1 yearRENT32500.0Not VerifiedDoes not meet the credit policy. Status:Fully Paidsmall_businessNY0.740.0Dec-20064.02.00.0381.063.5%2.0Sep-2010647.0
41212424402500.036 months7.43%77.69A< 1 yearRENT22800.0Not VerifiedDoes not meet the credit policy. Status:Charged OffmovingGA0.530.0Sep-19975.01.00.0416.010.4%2.0Jul-2010747.0
412134244111050.036 months15.96%388.28F3 yearsRENT27716.0Not VerifiedDoes not meet the credit policy. Status:Charged Offdebt_consolidationAZ12.900.0Jun-19973.09.00.02621.051.5%15.0Sep-2016647.0
41214424423000.036 months12.49%100.35D< 1 yearOWN65000.0Not VerifiedDoes not meet the credit policy. Status:Fully Paidcredit_cardNY14.250.0Jul-19998.017.00.08143.060.3%24.0Dec-2009667.0
41215424436000.036 months14.70%207.11E1 yearMORTGAGE22000.0Not VerifiedDoes not meet the credit policy. Status:Fully Paiddebt_consolidationNH20.000.0Jun-200019.017.00.015782.036.2%17.0Sep-2016667.0
41216424441500.036 months11.86%49.72D5 yearsRENT28000.0Not VerifiedDoes not meet the credit policy. Status:Fully PaidotherFL14.311.0Feb-20061.01.00.00.00%2.0Oct-2010667.0
41217424453000.036 months8.38%94.54A< 1 yearRENT20000.0Not VerifiedDoes not meet the credit policy. Status:Fully PaideducationalNY6.720.0Dec-19989.04.00.07021.027.4%4.0Jun-2016732.0
41218424464500.036 months8.07%141.15A< 1 yearRENT18240.0Not VerifiedDoes not meet the credit policy. Status:Fully PaidotherGA3.290.0Apr-20041.01.00.00.00%2.0Oct-2013737.0
412194244815000.036 months12.17%499.45D1 yearMORTGAGE83200.0Not VerifiedDoes not meet the credit policy. Status:Fully Paidcredit_cardWI17.020.0Oct-19955.014.00.037570.059.5%37.0Apr-2015712.0
41220424495000.036 months10.91%163.48C< 1 yearRENT42500.0Not VerifiedDoes not meet the credit policy. Status:Fully Paidcredit_cardFL1.210.0Aug-20051.02.00.01424.043.2%3.0Oct-2010677.0